Search CORE

24 research outputs found

Predicting wikipedia infobox type information using word embeddings on categories

Author: Biswas Russa
Koutraki Maria
Sack Harald
Publication venue: RWTH Aachen
Publication date: 01/01/2018
Field of study

Wikipedia has emerged as the largest multilingual, web based general reference work on the Internet. A huge amount of human resources have been invested in the creation and update of Wikipedia articles which are ideally complemented by so-called infobox templates defining the type of the underlying article. It has been observed that the Wikipedia infobox type information is often incomplete and inconsistent due to various reasons. However, the Wikipedia infobox type information plays a fundamental role for the RDF type information of Wikipedia based Knowledge Graphs such as DBpedia. This stimulates the need of always having the correct and complete infobox type information. In this work, we propose an approach to predict Wikipedia infobox types by using word embeddings on categories of Wikipedia articles, and analyze the impact of using minimal information from the Wikipedia articles in the prediction process

KITopen

DORIS: Discovering Ontological Relations In Services

Author: Koutraki Maria
Preda Nicoleta
Vodislav Dan
Publication venue: RWTH Aachen
Publication date: 01/01/2015
Field of study

We propose to demonstrate DORIS, a system that maps the schema of a Web Service automatically to the schema of a knowledge base. Given only the input type and the URL of the Web Service, DORIS executes a few probing calls, and deduces an intensional description of the Web service. In addition, she computes an XSLT transformation function that can transform a Web Service call result in XML to RDF facts in the target schema. Users will be able to play with DORIS, and to see how real-world Web Services can be mapped to large knowledge bases of the Semantic Web

KITopen

Temporal Role Annotation for Named Entities

Author: Bakhshandegan-Moghaddam Farshad
Koutraki Maria
Sack Harald
Publication venue: Amsterdam [u.a.] : Elsevier
Publication date: 01/01/2018
Field of study

Natural language understanding tasks are key to extracting structured and semantic information from text. One of the most challenging problems in natural language is ambiguity and resolving such ambiguity based on context including temporal information. This paper, focuses on the task of extracting temporal roles from text, e.g. CEO of an organization or head of a state. A temporal role has a domain, which may resolve to different entities depending on the context and especially on temporal information, e.g. CEO of Microsoft in 2000. We focus on the temporal role extraction, as a precursor for temporal role disambiguation. We propose a structured prediction approach based on Conditional Random Fields (CRF) to annotate temporal roles in text and rely on a rich feature set, which extracts syntactic and semantic information from text. We perform an extensive evaluation of our approach based on two datasets. In the first dataset, we extract nearly 400k instances from Wikipedia through distant supervision, whereas in the second dataset, a manually curated ground-truth consisting of 200 instances is extracted from a sample of The New York Times (NYT) articles. Last, the proposed approach is compared against baselines where significant improvements are shown for both datasets

Repositorium für Naturwissenschaften und Technik

"The Less Is More" for Text Classification

Author: Koutraki Maria
Sack Harald
Türker Rima
Zhang Lei
Publication venue: RWTH Aachen
Publication date: 01/01/2018
Field of study

KITopen

Leveraging Mathematical Subject Information to Enhance Bibliometric Data

Author: Bannister Adam
Koutraki Maria
Müller Fabian
Sack Harald
Teschke Olaf
Publication venue: RWTH Aachen
Publication date: 01/01/2017
Field of study

The field of mathematics is known to be especially challenging from a bibliometric point of view. Its bibliographic metrics are especially sensitive to distortions and are heavily influenced by the subject and its popularity. Therefore, quantitative methods are prone to misrepresentations, and need to take subject information into account. In this paper we investigate how the mathematical bibliography of the abstracting and reviewing service Zentralblatt MATH (zbMATH) could further benefit from the inclusion of mathematical subject information MSC2010. Furthermore, the mappings of MSC2010 to Linked Open Data resources have been upgraded and extended to also benefit from semantic information provided by DBpedia

KITopen

SOFYA: Semantic on-the-fly Relation Alignment

Author: Koutraki Maria
Preda Nicoleta
Vodislav Dan
Publication venue: OpenProceedings
Publication date: 01/01/2016
Field of study

Recent years have seen the rise of Web data, in particular Linked Data, with, up to now, more than 1000 datasets in the Linked Open Data Cloud (LOD). These datasets are mostly of entity-centric nature and are highly heterogeneous in terms of domains, language, schema, etc. Hence, the vision of uniformly querying such resources in the LOD has a long way to go. While equivalent entity instances across datasets are often linked by sameAs links, relations from different datasets and schemas are usually not aligned. In this paper, we propose an on-line instance-based relation alignment approach. The alignment may be performed during query execution and requires partial information from the datasets. We align relations to a target dataset using association rule mining approaches. We sample for equivalent entity instances with two main sampling strategies. Preliminary experiments, show that we are able to align relations with high accuracy, even if accessing the entire datasets is impossible or impractical

KITopen

TableNet: An approach for determining fine-grained relations for wikipedia tables

Author: Anand Avishek
Fetahu Besnik
Koutraki Maria
Publication venue: New York, NY : Association for Computing Machinery, Inc
Publication date: 01/01/2019
Field of study

We focus on the problem of interlinking Wikipedia tables with fine-grained table relations: equivalent and subPartOf. Such relations allow us to harness semantically related information by accessing related tables or facts therein. Determining the type of a relation is not trivial. Relations are dependent on the schemas, the cell-values, and the semantic overlap of the cell values in tables. We propose TableNet, an approach for interlinking tables with subPartOf and equivalent relations. TableNet consists of two main steps: (i) for any source table we provide an efficient algorithm to find candidate related tables with high coverage, and (ii) a neural based approach that based on the table schemas and data, determines with high accuracy the fine-grained relation. Based on an extensive evaluation with more than 3.2M tables, we show that TableNet retains more than 88% of relevant tables pairs, and assigns table relations with an accuracy of 90%

arXiv.org e-Print Archive

Crossref

Institutionelles Repositorium der Leibniz Universität Hannover

TECNE: Knowledge based text classification using network embeddings

Author: Koutraki Maria
Sack Harald
Türker Rima
Zhang Lei
Publication venue: RWTH Aachen
Publication date: 01/01/2018
Field of study

Text classification is an important and challenging task due to its application in various domains such as document organization and news filtering. Several supervised learning approaches have been proposed for text classification. However, most of them require a significant amount of training data. Manually labeling such data can be very time-consuming and costly. To overcome the problem of labeled data, we demonstrate TECNE, a knowledge-based text classification method using network embeddings. The proposed system does not require any labeled training data to classify an arbitrary text. Instead, it relies on the semantic similarity between entities appearing in a given text and a set of predefined categories to determine a category which the given document belongs to

KITopen

Machine Learning gegen Schwerhörigkeit : Vorhersage des Erfolgs bei Cochlea-Implantat-Versorgung

Author: Büchner Andreas
Koutraki Maria
Lenarz Thomas
Nejdl Wolfgang
Weller Tobias
Publication venue: Hannover : Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2020
Field of study

Sogenannte Cochlea-Implantate sind unter Menschen mit Schwerhörigkeit noch nicht sehr weit verbreitet, unter anderem, weil sich das Ausmaß des Sprachverstehens mit dem Implantat vor der Operation schwer einschätzen lässt. Wissenschaftlerinnen und Wissenschaftler der Medizinischen Hochschule Hannover (MHH), der Technischen Universität Braunschweig und des Forschungszentrums L3S wollen in einem von der VW-Stiftung geförderten Projekt Patientendaten auswerten, um den Erfolg von Cochlea-Implantaten besser bestimmen zu können

Institutionelles Repositorium der Leibniz Universität Hannover

Approaches Towards Unified Models for Integrating Web Knowledge Bases

Author: Koutraki Maria
Publication venue
Publication date: 27/09/2016
Field of study

Ma thèse a comme but l’intégration automatique de nouveaux services Web dans une base de connaissances. Pour chaque méthode d’un service Web, une vue est calculée de manière automatique. La vue est représentée comme une requête sur la base de connaissances. L’algorithme que nous avons proposé calcule également une fonction de transformation XSLT associée à la méthode qui est capable de transformer les résultats d’appel dans un fragment conforme au schéma de la base de connaissances. La nouveauté de notre approche c’est que l’alignement repose seulement sur l’alignement des instances. Il ne dépend pas des noms des concepts ni des contraintes qui sont définis par le schéma. Ceci le fait particulièrement pertinent pour les services Web qui sont publiés actuellement sur le Web, parce que ces services utilisent le protocole REST. Ce protocole ne permet pas la publication de schémas. En plus, JSON semble s’imposer comme le standard pour la représentation des résultats d’appels de services. À différence du langage XML, JSON n’utilise pas de noeuds nommés. Donc les algorithmes d’alignement traditionnels sont privés de noms de concepts sur lesquels ils se basent.My thesis aim the automatic integration of new Web services in a knowledge base. For each method of a Web service, a view is automatically calculated. The view is represented as a query on the knowledge base. Our algorithm also calculates an XSLT transformation function associated to the method that is able to transform the call results in a fragment according to the schema of the knowledge base. The novelty of our approach is that the alignment is based only on the instances. It does not depend on the names of the concepts or constraints that are defined by the schema. This makes it particularly relevant for Web services that are currently available on the Web, because these services use the REST protocol. This protocol does not allow the publication schemes. In addition, JSON seems to establish itself as the standard for the representation of technology call results

Theses.fr